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Mining in data is an important step for knowledge discovery, which leads to 
extract new patterns from datasets. It is a widespread methodology that has 
the capability to help ministries, companies, and experts for diving into the 
data to find important insights and patterns to help them take suitable 
decisions. The farmers and marketers of the date product in the production 


regions lack to discover the most important characteristics of dates types 


from the economically, healthy, and the type of consumers point of view to 
Keywords: achieve the highest profits by choosing the best types and the most 
consumed. The research objective is to extract interesting patterns from the 
dates’ product dataset, using Machine Learning, based on association rules 
ae generation. This, in turn, will support the farmers, and marketers to discover 
Data mining new features related to the production, consumption, and marketing 
Dates product processes. This research used a real dataset collected from KSA, Qassim 
Features extraction region, which is the first region of cultivation of palm, that produces the best 
Machine learning types of dates in the Arab region. The data preprocessed and analyzed by the 
Apriori algorithm. The results show important features and insights related to 
the health benefits of dates, production, its consumption, consumers types, 
and marketing. Consequently, these results can be employed, for instance, to 
encourage individuals to consume dates for their nutritional value and their 
important health benefits. Furthermore, the results encourage producers to 
focus on the production of preferable types and to improve the marketing 
policies of the other types. 
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1. INTRODUCTION 

The dates’ data contain hidden patterns as valuable knowledge, including the most produced types, 
the most consumed types, the undesirable types, etc. The Qassim region in KSA is one of the most producers 
places of dates in the Gulf region and in the world besides Iraq. Currently, the industry of dates becomes is an 
important food industry [1]. Dates are considering the most popular kinds in KSA and in the Gulf 
region [2-3], many research works concentrated on the management, marketing, and traceability of date's 
product [4-6], during the processes of sell and export this product. And till right now, there is not any study 
employing data mining or Machine Learning techniques to benefit from the features and characteristics 
hidden in the dates' data. The research problem focuses on the weaknesses that are related to the dates 
product in regards to the random production and marketing and the un-ability to discover the most important 
characteristics of this product from the economically, healthy, and the type of consumers point of view 
(male, female, level of age, etc.) to achieve the highest profits by identifying and choosing the best types and 
the most consumed based on the analysis of the real datasets of this product by the machine learning tools. 
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There are several insights and hidden features can be discovered from the analysis of the product data and 
may help for producing the most important types then marketing them in proper ways. Till right now, 
these features are not available, such as consumer impression about this product, the best and the worst types, 
the identify the reasons for the consumption of some types and the abandonment of other types. 
There are also some difficulties in terms of production increase, which far exceeds domestic demand, 
whereas there is a clear weakness in the process of marketing the undesirable types. By using the dates' data 
and recent technology to benefit from these data. The required features relevant to the production and 
consumption process can be discovered. The importance of this paper comes from the importance of this 
product globally and especially in KSA, as one of the most important regions in palm growing and date 
production in the world. 

This paper aims to extract interesting patterns from the dates’ product dataset, using the Machine 
Learning (Apriori algorithm), by extracting the most important types of knowledge (unseen patterns as 
new features), which consists of a set of association rules. The proposed solution comprises many tasks 
including information and data collection, divide data into samples, data preprocessing, mining process, rules 
generation, validation of results, and accuracy improvement. Based on the application of the Apriori 
algorithm [7-9] and the establishment of a set of association rules to help decision-makers in that domain. 
The association rules usually used to find strong relations and important characteristics from the data [10-12]. 
This paper is organized as follows; Section 2 introduces the literature review. Section 3 explains the research 
methodology and results generation, in addition, section 4 describes the validation of the results, while 
section 5 explains rules filtration and section 6 includes the discussion of the results. Finally, section 7 
demonstrates the conclusions. Finally, section 8 illustrates future works. 


2. LITERATURE REVIEW 

Currently, world companies and organizations are drowning in data but starving for knowledge. 
Data can be found as numerical values, records, figures, text documents, structures that are more complex, 
and etc. The complex data may appear in various forms; multimedia data, spatial data, and hypertext. To take 
complete advantage of data, we can retrieve and analyse it by different methods. These methods are complex 
and not enough for that purpose. It requires strong tools to discover patterns from raw data. With the massive 
amount of data placed in files, databases, and data warehouses, it is progressively imperative to utilize 
effective and powerful tools for data analysis and extraction of interesting patterns to help the 
decision-makers. This can be accomplished using Data Mining. Data Mining contains effective tools with 
great mechanisms to help miners focus to find the most important patterns from data using the Machine 
Learning algorithms. The Machine Learning deals with algorithm development as software that can learn and 
extract hidden patterns or features or relations from datasets. The Machine Learning algorithms adjust to 
changes and enhance performance according to the learning and training process. The Data Mining role is the 
application of Machine Learning algorithms on data for various purposes, such as prediction, classification, 
clustering, and extraction of association rules. 

The most common types of association rules algorithms are the frequent itemset mining and mining 
association rules. Three classes of these algorithms discussed and compared; Apriority algorithm, FP-growth 
algorithm, and Eclat algorithm [13]; the Eclat algorithm is suitable for Big data sets and the Apriori algorithm 
and the FP-growth are better for small data sets, that’s why we use it in this research. A typical example of 
using the association rules is to discover which items in a supermarket are normally put together in the basket 
market for a specific customer. Various approaches are employed for the association rules extraction [9]. 
In Data Mining, the datasets can be employed to compare and select the best methods such as classifiers and 
predictors for improving Data Mining techniques and algorithms [14]. One of the common Data Mining 
algorithms is the Apriori algorithm that is used for frequent patterns analysis and extraction of 
association rules. This algorithm usually used to generate all significant association rules between items 
in a database. Currently, many organizations/companies are using Data Mining task and Machine Learning 
on a regular basis. Some of these companies include; retail stores, schools, banks, and insurance companies. 
Many of these organizations combine Data Mining with such things as pattern recognition, statistics, 
and software tools. Data Mining used to find interesting patterns and relations that would otherwise be 
difficult to find. It allows data owners to study and understand their customer's behaviour and make smart 
marketing decisions [15], for their products and services. 

The Data Mining always aims at the analysis of historical datasets from different perspectives 
[16-18], to sum up, the data in new ways that are both clear and useful to increase revenue, cut costs, or both 
for the data owner [2]. It becomes common in both the private and public sector [19, 20] to satisfy various 
needs using various applications that are employed in a local and global society to enhance the 
services and procedures. Therefore, there is an increasing request for mining about interesting 


IJ-AI Vol. 8, No. 3, September 2019: 205 — 214 


IJ-AI ISSN: 2252-8938 o 207 


patterns in datasets. The process of analyzing such data is a really computationally very complex process 
when using traditional methods [21]. In addition to what previously discussed, there are many research works 
provided as contributions in this field of study, some are focusing on the data analysis [22-25] and others are 
concentrating on the development and refinement of the algorithm [12, 26, 25, 16]. This is because the Data 
Mining is a multidisciplinary field with a wide and diverse application developed for data analysis. In fact, 
there exist non-slight gaps between knowledge discovery fundamentals and domain applications. A few of 
the application domains include; the analysis of product data, educational data, retail industry, 
spatial-temporal data, and medical data [26]. Furthermore, there are more related contributions are similar to 
this research, for instance, Cornelis studied and analyse the association rules problem relevant to positive and 
negative values for Big Data [27], likewise, Mahmood et al. concentrated on proposing an algorithm for 
discovering positive and negative association rules among frequent and infrequent item sets. The identified 
associations among medical test results using Data Mining algorithms [8]. Association rule generally 
comprises of a set of antecedent parts that lead to a consequent part with a certain confidence. 
Pazzani and Billsus see the list of subjects of books customers suggest for as transactions, which enable them 
to find groups of association rules for concerns that frequently appeared together as part of a 
customer's interests [28]. Also, Osadchiy et al. proposed an algorithm that recognized a model of collective 
preferences independently of the customer's interests. This requires a simple system of ratings, the 
performance of that algorithm evaluated by a large dataset of various transactions of real dietary recalls. 
It has demonstrated that the execution based on pairwise association rules achieves better for the 
defined task [29]. In fact, our research concentrates on a different idea, where it depends on the generation of 
association rules using a different kind of data consequently discover other types of knowledge. 

Other research work provided a valuable community service, where Vasavi, used Data Mining 
algorithms for Hidden Patterns extraction from Road Accident dataset of highways that pass through Krishna 
district Indian for (2013), as a heterogeneous data collected from police stations. The objective was to find 
the shared features between accidents. The data analyzed using Machine Learning algorithms and the results 
generated are sets of association rules by Apriori algorithm [30], as well, Sene et al. worked on association 
rules but for analyzing a different database describing in-flight medical incidents to extract interesting 
knowledge from that data [7]. Miholca et al. investigated the problem of incremental relational of association 
tule mining. They proposed a new method named "Incremental Relational Association Rule Mining 
(IRARM)" for incrementally uncovering interesting relational association rules within a dynamic dataset 
during updates. A number of experiments carried out in order to show that the proposed method generates the 
results more rapidly than the execution of the Data Mining algorithms, on the extended dataset [31]. 
An additional approach presented for mining generalized association rules. An algorithm developed to scan 
the database one time only and use transaction dataset to compute the support of generalized item set faster 
than other similar algorithms [32]. Vidhate and Kulkarni proposed an efficient algorithm to a set of data 
collected from different shops to find a set of frequent items [33], on the other hand, Fernandez-bassso et al. 
proposed a parallelization algorithm for association rule extraction using Big Data technologies, which uses 
an efficient algorithm to address the problems related to the massive amounts of data [9]. 

Sadh and Shukla proposed a mining-based optimization technique for rule generation based on the 
Apriori algorithm and ant colony optimization approach. They applied the Apriori algorithm [34], on the 
other hand, Prajapati et al. identified consistent and inconsistent association rules from sales using a 
distributed datasets [21]. A modified form of the frequent itemset mining method presented using an 
improved formula for generating valid candidates by decreasing the number of invalid candidates. During the 
generation process of association rule sets, the confidence and support measures were applied [12]. 
The produced frequent k-item set is specified to the association rule generator to create all possible rules [35]. 
Rajeswari et al. proposed a modified fuzzy algorithm for Apriori rare Item sets mining to detect the outliers 
that represent weak student depend on the heap space usage [36]. An additional approach was proposed to 
extract a set of association rules based on medical data, the objective is to select the best mining algorithm of 
association rules according to multiple-criteria decision analysis [37]. In this paper, our approach is 
concentrating on the analysis of dates' data in order to find interesting patterns within the extracted 
association rules. These patterns are strongly relevant to the production and consummation of the 
date's product. 


3. RESEARCH METHOD 

The overall steps of the methodology are shown in Figure 1. It comprises dataset gathering, data 
preprocessing, mining process, knowledge generation & representation, and accuracy improvement. 
These steps are explained in the following sub-sections: 
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Figure 1. Methodology layouts 


3.1. Information and Data Collection 

Important information collected by interviews with a number of people and the collected data was 
by an online questioner designed, evaluated, and distributed to a sample of consumers, producers, marketers, 
and product manager. It distributed to a sample of 640 people. The collected dataset attributes presented 
in Table 1. Some values of the collected data are incorrect, and others are incomplete, it contains missing 
values, this reason leads to a cleaning. After the cleaning process, we got 499 records as a total number of 
instances employed in this research. 


Table 1. The dataset attributes 


I Attribute Description Available values for this attribute 
1 MCK Annual Consumption in Kilo 1-10 Kg, 10-20 Kg, ...more-than-50-kg 
Gram (KG) 

2 NDC Number of Daily Consumption One-time, Two-times, Rarely, None 

3 ASR Spending Rate/family per year 50-100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-800, 800- 
1000, more-than-1000 

4 DBM Dates as a Basic Meal Yes, No 

a VF Value of Food Yes, No, To-some-extent, don't know 

6 CR Consumption Reason Healthy, Social customs 

7 CG Consumer's Gender Female, Male 

8 MCT Most Consumption Type Ajwa, Barhi, Garawea, Hulwa, Khalas, Nabtat-Ali, Nabtat-Seif, 

9 US Undesirable Species Rushodia, Ruthana, Shagra, Sukkari, Wannana, Ajwa, Barhi, Garawea, 

10 MPT Most Produced Types Hulwa, Maktomi, Nabtat-Ali, Nabtat-Rashed, Nabtat-Seif, Nabtat- 
Sultan, Om-hmam, Rushodia, Ruthana, Shagra, Sukkari, Wannana 

ll CA Consumer's Age All Ages, All Ages, Elderly, Children, Young 


3.2. Dataset Samples 


The dataset divided into four samples. The first consists of two attributes, the second includes four 


attributes, and the third & fourth contain five attributes. Some samples are overlapped. The output is a set of 
rules reflexing some features of the date's product relevant to production and consummation processes. 


3.3. Data Preprocessing 

The Pre-processing task is the basic step in knowledge discovery using machine learning [38], 
it includes various tasks [39]; remove inconsistent data, noisy data, attributes coding, transformation, and 
loading [40]. This, in turn, will improve the data quality and the accuracy of the results. The Apriori 
algorithm was selected as a useful rule-based technique in order to discover strong hidden patterns as 
a set of rules. 


3.4. Data Analysis and Rules Generation 

Association rules method is considered one of the important functionality of Data Mining, 
it includes three types; multilevel association rules, multidimensional association rules, and quantitative 
association rules. This research is using the multilevel association rules, the results of the analysis of four 
samples are demonstrated as follows: 
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—Rule Set 1 (Rs1): 

This sample consists of six rules based on two attributes {The Most Produced Types and 
Undesirable Species}. The results generated when the minimum support and the Minimum metric confidence 
were 0.1. The extracted rules are five rules, as follows: 

Undesirable Species=Ajwa 59 — The Most Produced Types=Sukkari 444 conf:(0.93) 
Undesirable Species=Garawea 52 — The Most Produced Types=Sukkari 444 conf:(0.92) 
Undesirable Species=Om-hmam 100 — The Most Produced Types=Sukkari 444 conf:(0.94) 
The Most Produced Types=Sukkari 444 — Undesirable Species=Om-hmam 84 conf:(0.19) 
The Most Produced Types=Sukkari 444 — Undesirable Species=Ajwa 55 conf:(0.12) 

. The Most Produced Types=Sukkari 444 — Undesirable Species=Garawea 48 conf:(0.11) 
—Rule Set 2 (Rs2): 

It consists of a set of 5 rules generated based on four attributes, which are {Most Consumption 
Type, Monthly Consumption in KG, Number of daily consumptions, and Consumption Reason}. 
The minimum support value was 0.3 and the minimum metric confidence was 0.1. These rules are as follows: 
Number of daily consumptions=One-time 218 — Most Consumption Type=Sukkari 189 conf:(0.87) 
Consumption Reason=Social traditions 217 — Most Consumption Type=Sukkari 187 conf:(0.86) 
Consumption Reason=Healthy 282 — Most Consumption Type=Sukkari 240 conf:(0.85) 

Monthly Consumption in KG=1-10-kg 247 — Most Consumption Type=Sukkari 209 conf:(0.85) 
. Most Consumption Type=Sukkari 427 — Consumption Reason=Healthy 240 conf:(0.56) 
—Rule Set 3 (Rs3): 

It includes seven rules based on the analysis of five attributes {Spending Rate, Consumer Age, 
Value of Food, Consumer Gender, and Dates as a Basic Meal}. The minimum support value was 0.05 and the 
minimum metric confidence was 0.9. The best rules extracted are: 

a. Spending Rate=more-than-1000-SR Consumer Age=Elderly Consumer Gender=Male Dates as a Basic 
Meal=Yes 31 — Value of Food=Yes 29 conf:(0.94) 

Spending Rate=800-1000-SR Consumer Gender=Female 30 — Value of Food=Yes 28 conf:(0.93) 
Spending Rate=400-500-SR Consumer Age=Elderly 28 — Value of Food=Yes 26 conf:(0.93) 
Spending Rate=100-200-SR 41 — Value of Food=Yes 38 conf:(0.93) 

Spending Rate=more-than-1000-SR Consumer Age=Elderly Dates as a Basic Meal=Yes 40 — Value of 
Food=Yes 37 conf:(0.93) 

Spending Rate=400-500-SR Dates as a Basic Meal=Yes 30 — Value of Food=Yes 33 conf:(0.92) 
Spending Rate=more-than-1000-SR Consumer Age=Elderly Consumer Gender=Male 46 — Value of 
Food=Yes 42 conf:(0.91) 

—Rule Set 4(Rs4): 

This set of rules generated based on five attributes overlapped with the previous sets, including 
{Undesirable Species, The Most Produced Types, Most Consumption Type, Monthly Consumption in KG, 
and Number of daily Consumption}. The minimum support was 0.1 and the minimum metric confidence was 
0.89. The output is a set of 14 rules as follows: 

a. Undesirable Species=Ajwa Most Consumption Type=Sukkari 52 — The Most Produced Types=Sukkari 
51 conf:(0.98) 

b. Most Consumption Type=Sukkari Monthly Consumption in KG=20-30-kg 61 — The Most Produced 

Types=Sukkari 59 conf:(0.97) 

c. Most Consumption Type=Sukkari Number of daily consumptions=Rarely 64 — The Most Produced 

Types=Sukkari 61 conf:(0.95) 

d. Most Consumption Type=Sukkari Monthly Consumption in KG=1-10-kg Number of daily 

consumptions=One-time 104 — The Most Produced Types=Sukkari 98 conf:(0.94) 

e. Monthly Consumption in KG=20-30-kg 69 — The Most Produced Types=Sukkari 65 conf:(0.94) 

f. | Most Consumption Type=Sukkari Monthly Consumption in KG=1-10-kg 209 — The Most Produced 

Types=Sukkari 196 conf:(0.94) 

g. Undesirable Species=Ajwa 59 — The Most Produced Types=Sukkari 55 conf:(0.93) 

h. Most Consumption Type=Sukkari Number of daily consumptions=One-time 189 — The Most 

Produced Types=Sukkari 176 conf:(0.93) 

i. | Most Consumption Type=Sukkari 427 — The Most Produced Types=Sukkari 397 conf:(0.93) 

j. | Monthly Consumption in KG=1-10-kg Number of daily consumptions=One-time 125 — The Most 
Produced Types=Sukkari 116 conf:(0.93) 

k. Undesirable Species=Ajwa The Most Produced Types=Sukkari 55 — Most Consumption Type=Sukkari 
51 conf:(0.93) 

1. Most Consumption Type=Sukkari Number of daily consumption=Two-times 119 — The Most 
Produced Types=Sukkari 110 conf:(0.92) 
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m. Undesirable Species=Om-hmam The Most Produced Types=Sukkari 84 — Most Consumption 
Type=Sukkari 77 conf:(0.92) 

n. Monthly Consumption in KG=1-10-kg Number of daily consumption=Rarely 58 — The Most Produced 
Types=Sukkari 53 conf:(0.91) 


3.4. Measuring Support and Confidence 

In this step, the Support and Confidence measures applied to validate the outputs. Appendix A 
contains all generated rules with the ranking values of these measures. The values show the importance of 
each rule amongst other rules. The formulas of Support and Confidence are given in Formula (1) and (2), 
respectively [41-42]. The association rules format can be written as “IF” part = antecedent “THEN” 
part = consequent. The whole dataset applied once, but the final rules were limited and covering all Dates' 
types partially, that is the justification of divided the attributes into 4 samples and generate a big set of rules 
some of them were weak and the others were strong, then the filtration process. 


supp(x—y)=supp(xUy)=P(xNy)supp(x—y)=supp(xUy)=P(xNy) (1) 


conf(x—y)=supp(x—y)/supp(x)=supp(xUy)/supp(x)=P(xNy)/P(x)=P(y|x) (2) 


4. RULES VALIDATION 

To validate the generated rules, the frequent item generates strong association rules must satisfy 
minimum support and minimum confidence [42]. The minimum confidence of a rule is a user-defined value 
and an association rule is strong if it has supported greater than the minimum support value and confidence 
greater than the minimum confidence value [43]. All the generated rules are shown in Table 2 contains 32 
rules. All of them have support and confidence values greater than the minimum support and minimum 
confidence values. 


Table 2. All generated rules 


Index Rank Rule Sets Components 
9 522 Consumption Reason=Healthy 282 — Most Consumption Type=Sukkari 240 conf:(0.85) 
8 404 Consumption Reason=Social traditions 217 —- Most Consumption Type=Sukkari 187 conf:(0.86) 


10 456 Monthly Consumption in KG=1-10-kg 247 — Most Consumption Type=Sukkari 209 conf:(0.85) 

28 241 Monthly Consumption in KG=1-10-kg Number of daily consumptions=One-time 125 — The Most Produced 
Types=Sukkari 116 conf:(0.93) 

32 111 Monthly Consumption in KG=1-10-kg Number of daily consumption=Rarely 58 — The Most Produced 
Types=Sukkari 53 conf:(0.91) 

23 134 Monthly Consumption in KG=20-30-kg 69 — The Most Produced Types=Sukkari 65 conf:(0.94) 

11 667 Most Consumption Type=Sukkari 427 — Consumption Reason=Healthy 240 conf:(0.56) 

27 824 Most Consumption Type=Sukkari 427 — The Most Produced Types=Sukkari 397 conf:(0.93) 

24 378 Most Consumption Type=Sukkari Monthly Consumption in KG=1-10-kg 209 — The Most Produced Types=Sukkari 
196 conf:(0.94) 
22 202 Most Consumption Type=Sukkari Monthly Consumption in KG=1-10-kg Number of daily consumptions=One-time 
104 — The Most Produced Types=Sukkari 98 conf:(0.94) 

20 120 Most Consumption Type=Sukkari Monthly Consumption in KG=20-30-kg 61 — The Most Produced Types=Sukkari 
59 conf:(0.97) 
26 365 Most Consumption Type=Sukkari Number of daily consumptions=One-time 189 — The Most Produced 
Types=Sukkari 176 conf:(0.93) 
21 125 Most Consumption Type=Sukkari Number of daily consumptions=Rarely 64 — The Most Produced Types=Sukkari 
61 conf:(0.95) 


30 229 Most Consumption Type=Sukkari Number of daily consumptions=Two-times 119 — The Most Produced 
Types=Sukkari 110 conf:(0.92) 

7 407 Number of daily consumptions=One-time 218 — Most Consumption Type=Sukkari 189 conf:(0.87) 

15 79 Spending Rate=100-200-SR 41 — Value of Food=Yes 38 conf:(0.93) 

14 54 Spending Rate=400-500-SR Consumer Age=Elderly 28 — Value of Food=Yes 26 conf:(0.93) 

17 63 Spending Rate=400-500-SR Dates as a Basic Meal=Yes 30 — Value of Food=Yes 33 conf:(0.92) 

13 58 Spending Rate=800-1000-SR Consumer Gender=Female 30 — Value of Food=Yes 28 conf:(0.93) 

18 88 Spending Rate=more-than-1000-SR Consumer Age=Elderly Consumer Gender=Male 46 — Value of Food=Yes 42 
conf:(0.91) 

12 60 Spending Rate=more-than-1000-SR Consumer Age=Elderly Consumer Gender=Male Dates as a Basic Meal=Yes 31 
— Value of Food=Yes 29 conf:(0.94) 

16 77 Spending Rate=more-than-1000-SR Consumer Age=Elderly Dates as a Basic Meal=Yes 40 — Value of Food=Yes 


37 conf:(0.93) 


5 499 The Most Produced Types=Sukkari 444 — Undesirable Species=Ajwa 55 conf:(0.12) 
6 492 The Most Produced Types=Sukkari 444 — Undesirable Species=Garawea 48 conf:(0.11) 
4 528 The Most Produced Types=Sukkari 444 — Undesirable Species=Om-hmam 84 conf:(0.19) 
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Index Rank Rule Sets Components 

1 503 Undesirable Species=Ajwa 59 — The Most Produced Types=Sukkari 444 conf:(0.93) 

25 114. Undesirable Species=Ajwa 59 — The Most Produced Types=Sukkari 55 conf:(0.93) 

19 103 Undesirable Species=Ajwa Most Consumption Type=Sukkari 52 — The Most Produced Types=Sukkari 51 
conf:(0.98) 

29 106 —_ Undesirable Species=Ajwa The Most Produced Types=Sukkari 55 — Most Consumption Type=Sukkari 51 
conf:(0.93) 

2 496 Undesirable Species=Garawea 52 — The Most Produced Types=Sukkari 444 conf:(0.92) 

3 544 Undesirable Species=Om-hmam 100 — The Most Produced Types=Sukkari 444 conf:(0.94) 

31 161 Undesirable Species=Om-hmam The Most Produced Types=Sukkari 84 — Most Consumption Type=Sukkari 77 


conf:(0.92) 


5. RULES FILTRATION 

The values of support measure normalized to a small range by dividing each value over 499, 
(the dataset size), to be compatible with the values of Confidence as a primary step to finding the rank values. 
The Support and Confidence values used to calculate the rank of each rule, according to Formula (3), 
after that the rules filtered by removing the redundancy and removing the rules that have lower ranks 
(lower quality). The next step is the selection of the rules that have the highest quality/highest ranks. 
Figure 2 demonstrates the final results for all generated rules and their ranks. The rules shown above the 
value 1.6 in the Y-axis in the chart, the highest points in this figure represent the highest ranks. These rules 
are shown in Figure 2, it includes the following set of rules {1, 2, 3, 7, 8, 9, 10, 12, 16, 18, 24, 26, 27}. 
This set contains the best rules, where it found that there are 13 rules have the highest ranks, it covers all 
dates' types included in the research; it represents the final results as in Table 3. 


Rank = (Sup-of Consequent/Ds) + (Sup-of Antecedent/Ds) + Confidence (3) 


where Ds is the dataset size =499. 


Table 3. The Final set of rules 


Rank Rules 


503 Undesirable Species=Ajwa 59 — The Most Produced Types=Sukkari 444 conf:(0.93) 
496 Undesirable Species=Garawea 52 — The Most Produced Types=Sukkari 444 conf:(0.92) 
544 Undesirable Species=Om-hmam 100 — The Most Produced Types=Sukkari 444 conf:(0.94) 
407 Number of daily consumptions=One-time 218 — Most Consumption Type=Sukkari 189 conf:(0.87) 
Consumption Reason=Social traditions 217 + Most Consumption Type=Sukkari 187 conf:(0.86) 
522 Consumption Reason=Healthy 282 — Most Consumption Type=Sukkari 240 conf:(0.85) 
456 Monthly Consumption in KG=1-10-kg 247 — Most Consumption Type=Sukkari 209 conf:(0.85) 
77 Spending Rate=more-than-1000-SR Consumer Age=Elderly Dates as a Basic Meal=Yes 40 — Value of Food=Yes 
37 conf:(0.93) 
18 88 Spending Rate=more-than-1000-SR Consumer Age=Elderly Consumer Gender=Male 46 — Value of Food=Yes 42 
conf:(0.91) 
24 378 Most Consumption Type=Sukkari Monthly Consumption in KG=1-10-kg 209 — The Most Produced 
Types=Sukkari 196 conf:(0.94) 
26 365 Most Consumption Type=Sukkari Number of daily consumptions=One-time 189 — The Most Produced 
Types=Sukkari 176 conf:(0.93) 
27 824 Most Consumption Type=Sukkari 427 — The Most Produced Types=Sukkari 397 conf:(0.93) 
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Figure 2. Rules ranking 
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6. RESULTS DISCUSSION 

The results obtained as an association rule set, filtered by selecting the high ranks rules according to 
the two measures of support and confidence. This process includes; exclude the rules that are partially 
covering the required cases and the redundant rules. The final results include the minimum number of these 
rules, that cover the most cases of dates' types included in this research. The initial set of association rules 
extracted based on the ranking values (the highest values) amongst all generated rules that include 32 rules, 
although, all these rules, covering all required cases, it reduced to a small set containing 13 very strong rules. 

These rules provide the required knowledge given by the initial set; it will be discussed in the 

following points: 

a. People are eating dates in the KSA because they are believing that it is a healthy meal in the first class 
then the second reason is eating dates as social traditions, especially with relatives and guests as shown 
in rule number 8 and 9 in Appendix A and in Table 3, their ranks are (522 and 404), respectively. 

b. The results show that the most consumption type of dates is "Sukkari", this type is the desirable type by 
the consumers. All rules show this except the rules 16 and 18, the ranks of these two rules are low 
(88, 77), see Appendix A. 

c. The amount of dates monthly consumed in Kg for each family is between 1 to 10-kg, this result 
supported by rule number10 and 24 with a ranking value equal to (456, 378). 

d. The most produced type of dates in KSA is "Sukkari", based on the results illustrated by the rules 
number 1, 2, 3, 24, 26, and 27 that have a high rank based on the support and confidence values 
(503, 696, 544, 378, 365, and 824). 

e. Likewise, the results demonstrate the number of daily consumption times is only one time as illustrated 
in the rules 7 and 26 with a ranking value equal to (407, 365). 

f. As well as it found that the spending rate for each family per year is more-than 1000 Saudi Riyal, and 
the most consumer's age is "Elderly", as noticed in the rules number 16 and 18, corresponding to the 
following values of ranks (77, 88). In addition, these two rules show that the dates are representing an 
important food value. 

g. Moreover, the rule number 18, illustrates that the consumer's type mostly is Male. The two rules 
(16 and 18) are giving patterns that are more important. 

h. Also, the results showed that the dates consumed by KSA consumers as a basic meal, this obtained by 
the results of rule number 16, which has a ranking value is (77). Finally, the analyzed results have 
shown the most undesirable species of dates are respectively "Om-hmam" comes in the first place, 
secondly "Ajwa" after that is "Garawea". A set of rules are supporting this approach, includes rule 
number 1, 2, and 3, which have the following ranks (503, 496, 544), respectively. 


7. CONCLUSIONS 

The research results provide a type of contributions, represent the cooperation and interaction 
between the agricultural and information technology fields, for serving the community in KSA, Gulf region, 
and other countries around the world that are producing the dates. The research concentrated on dates' data 
analysis for serving the marketing and the production process of dates' product, and to understand the 
consumer Interests. The research results provide important knowledge, through the employment of the Data 

Mining in dates' data. The research concentrated on the extraction of new features and insights to improve the 

marketing and the production process of dates' product and to understand the consumer interests. Based on 

the extracted sets of association rules shown above, and the discussions carried out on those rules, we can 
reach the following conclusions: 

a. The highest quality date's product type is "Sukkari" because it is the most produced and the most 
consumed type. 

b. The undesirable types of dates are "Om-hmam", "Ajwa", and "Garawea", this, in turn, leads us to the 
following facts: 

c. The focusing on a specific type of dates "Sukkari", which is the most consumption and the most 
productive type may be as a result of a strong marketing policy for this type and weak marketing 
policies of the other types or maybe is an inherited culture from the ancestors to their sons and 
grandchildren in this country over past decades. 

d. Many producers have a great interest in cultivating specific species of dates although there are many 
other species, and this is probably due to the benefits they get. 

e. Prices may have an important role to play in the process of buying or abstaining from a certain type of 
reason in the fame of that product (cheap or expensive). 

f. | Undesirable species of dates can be marketed in modern ways such as using the social media various 
platforms, Television channels, in addition to using other traditional methods, exploiting social and 


IJ-AI Vol. 8, No. 3, September 2019: 205 — 214 


IJ-AI ISSN: 2252-8938 o 213 


cultural events and other occasions to distribute free samples of other species to create a strong 
consumption culture based on real experience. 

g. Most of the consumers eat dates as a healthy meal and also as a basic meal in the first class, especially, 
amongst the elderly class. 

h. Saudis buy dates with an average of more than SR 1,000 per family per year and most consumers are 
males. 


8. FUTURE WORK 
The research idea can be extended from different perspectives according to the following points: 

— Study more features of the dates’ product related to its health benefits. 

— Collect inclusive data from various regions producing the dates, and increase the number of attributes 
used in the analysis. 

— Improve the quality of the generated patterns using the Logical Analysis of Data (LAD) method. 

— Using the same data to compare the accuracy of the results for various Machine Learning tools such as 
those included in Rapid Miner, IBM SPSS, Tanagra, python tools, KNIME, Orange, etc. 
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